Statistical Inference for Online Learning and Stochastic Approximation via Hierarchical Incremental Gradient Descent
نویسندگان
چکیده
Stochastic gradient descent (SGD) is an immensely popular approach for online learningin settings where data arrives in a stream or data sizes are very large. However, despite anever-increasing volume of work on SGD, much less is known about the statistical inferentialproperties of SGD-based predictions. Taking a fully inferential viewpoint, this paper introducesa novel procedure termed HiGrad to conduct statistical inference for online learning, withoutincurring additional computational cost compared with SGD. The HiGrad procedure begins byperforming SGD updates for a while and then splits the single thread into several threads, andthis procedure hierarchically operates in this fashion along each thread. With predictions pro-vided by multiple threads in place, a t-based confidence interval is constructed by decorrelatingpredictions using covariance structures given by the Ruppert–Polyak averaging scheme. Undercertain regularity conditions, the HiGrad confidence interval is shown to attain asymptoticallyexact coverage probability. Finally, the performance of HiGrad is evaluated through exten-sive simulation studies and a real data example. An R package higrad has been developed toimplement the method.
منابع مشابه
Online Algorithm for Orthogonal Regression
In this paper, we introduce a new online algorithm for orthogonal regression. The method is constructed via a stochastic gradient descent approach combined with the idea of a tube loss function, which is similar to the one used in support vector (SV) regression. The algorithm can be used in primal or in dual variables. The latter formulation allows the introduction of kernels and soft margins. ...
متن کاملIncremental Natural Actor-Critic Algorithms
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods...
متن کاملAdaptive regularization for Lasso models in the context of non-stationary data streams
Large scale, streaming datasets are ubiquitous in modern machine learning. Streaming algorithms must be scalable, amenable to incremental training and robust to the presence of non-stationarity. In this work consider the problem of learning `1 regularized linear models in the context of streaming data. In particular, the focus of this work revolves around how to select the regularization parame...
متن کاملEarning for D Eep N Eural N Etworks
In industrial machine learning pipelines, data often arrive in parts. Particularly in the case of deep neural networks, it may be too expensive to train the model from scratch each time, so one would rather use a previously learned model and the new data to improve performance. However, deep neural networks are prone to getting stuck in a suboptimal solution when trained on only new data as com...
متن کاملImportance Sampled Stochastic Optimization for Variational Inference
Variational inference approximates the posterior distribution of a probabilistic model with a parameterized density by maximizing a lower bound for the model evidence. Modern solutions fit a flexible approximation with stochastic gradient descent, using Monte Carlo approximation for the gradients. This enables variational inference for arbitrary differentiable probabilistic models, and conseque...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1802.04876 شماره
صفحات -
تاریخ انتشار 2018